A robust multi-class AdaBoost algorithm for mislabeled noisy data
نویسندگان
چکیده
AdaBoost has been theoretically and empirically proved to be a very successful ensemble learning algorithm, which iteratively generates a set of diverse weak learners and combines their outputs using the weighted majority voting rule as the final decision. However, in some cases, AdaBoost leads to overfitting especially for mislabeled noisy training examples, resulting in both its degraded generalization performance and non-robustness. Recently, a representative approach named noise-detection based AdaBoost (ND_AdaBoost) has been proposed to improve the robustness of AdaBoost in the two-class classification scenario, however, in the multi-class scenario, this approach can hardly achieve satisfactory performance due to the following three reasons. (1) If we decompose a multi-class classification problem using such strategies as one-versus-all or one-versus-one, the obtained two-class problems usually have imbalanced training sets, which negatively influences the performance of ND_AdaBoost. (2) If we directly apply ND_AdaBoost to the multi-class classification scenario, its two-class loss function is no longer applicable and its accuracy requirement for the (weak) base classifiers, i.e., greater than 0.5, is too strong to be almost satisfied. (3) ND_AdaBoost still has the tendency of overfitting as it increases the weights of correctly classified noisy examples, which could make it focus on learning these noisy examples in the subsequent iterations. To solve the dilemma, in this paper, we propose a robust multi-class AdaBoost algorithm (Rob_MulAda) whose key ingredients consist in a noise-detection based multi-class loss function and a new weight updating scheme. Experimental study indicates that our newly-proposed weight updating scheme is indeed more robust to mislabeled noises than that of ND_AdaBoost in both two-class and multi-class scenarios. In addition, through the comparison experiments, we also verify the effectiveness of Rob_MulAda and provide a suggestion in choosing the most appropriate noise-alleviating approach according to the concrete noise level in practical applications. Crown Copyright © 2016 Published by Elsevier B.V. All rights reserved.
منابع مشابه
A Robust Boosting Method for Mislabeled Data
Abstract We propose a new, robust boosting method by using a sigmoidal function as a loss function. In deriving the method, the stagewise additive modelling methodology is blended with the gradient descent algorithms. Based on intensive numerical experiments, we show that the proposed method is actually better than AdaBoost and other regularized method in test error rates in the case of noisy, ...
متن کاملA Multiclass Extension To The Brownboost Algorithm
Brownboost is an adaptive, continuous time boosting algorithm based on the Boost-by-Majority (BBM) algorithm. Though it has been little studied at the time of writing, it is believed that it should prove especially robust with respect to noisy data sets. This would make it a very useful boosting algorithm for real-world applications. More familiar algorithms such as Adaboost, or its successor L...
متن کاملRobust multi-class boosting
Boosting approaches are based on the idea that high-quality learning algorithms can be formed by repeated use of a “weak-learner”, which is required to perform only slightly better than random guessing. It is known that Boosting can lead to drastic improvements compared to the individual weak-learner. For two-class problems it has been shown that the original Boosting algorithm, called AdaBoost...
متن کاملA noise filtering method using neural networks - Soft Computing Techniques in Instrumentation, Measurement and Related Applications, 2003. SCIMA 20
A = During the data collecling and labeling process it is possible for noise to be introduced into a dato set. As a result, the quality of the data set degrades and experiments and inferences derivedfrom the data set become less reliable. In th tpaper we present an algorithm, called A N R (automati? noise reduction), as apltering mechanism lo identify and remove noisy data items whose classes h...
متن کاملA Multi-Objective Approach to Fuzzy Clustering using ITLBO Algorithm
Data clustering is one of the most important areas of research in data mining and knowledge discovery. Recent research in this area has shown that the best clustering results can be achieved using multi-objective methods. In other words, assuming more than one criterion as objective functions for clustering data can measurably increase the quality of clustering. In this study, a model with two ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Knowl.-Based Syst.
دوره 102 شماره
صفحات -
تاریخ انتشار 2016